Grammar based statistical MT on HadoopAn end-to-end toolkit for large scale PSCFG based MT

نویسندگان

  • Ashish Venugopal
  • Andreas Zollmann
چکیده

This paper describes the open-source Syntax Augmented Machine Translation (SAMT) on Hadoop toolkit—an end-to-end grammar based machine statistical machine translation framework running on the Hadoop implementation of the MapReduce programming model. We present the underlying methodology of the SAMT approach with detailed instructions that describe how to use the toolkit to build grammar based systems for large scale translation tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT

Probabilistic synchronous context-free grammar (PSCFG) translation models define weighted transduction rules that represent translation and reordering operations via nonterminal symbols. In this work, we investigate the source of the improvements in translation quality reported when using two PSCFG translation models (hierarchical and syntax-augmented), when extending a state-of-the-art phraseb...

متن کامل

New Parameterizations and Features for PSCFG-Based Machine Translation

We propose several improvements to the hierarchical phrase-based MT model of Chiang (2005) and its syntax-based extension by Zollmann and Venugopal (2006). We add a source-span variance model that, for each rule utilized in a probabilistic synchronous context-free grammar (PSCFG) derivation, gives a confidence estimate in the rule based on the number of source words spanned by the rule and its ...

متن کامل

Gappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars

Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on ...

متن کامل

An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT

We present an efficient, novel two-pass approach to mitigate the computational impact resulting from online intersection of an n-gram language model (LM) and a probabilistic synchronous context-free grammar (PSCFG) for statistical machine translation. In first pass CYK-style decoding, we consider first-best chart item approximations, generating a hypergraph of sentence spanning target language ...

متن کامل

Machine Translation Strategies: A Comparison of F-Structure Transfer and Semantically Based Interlingua

Two machine translation (MT) systems which respectively utilize the transfer and interlingua strategies will be presented and compared, emphasizing design principles. Feature structures and unification-based grammar are common denominators for the two MT systems; in particular, both make use of Lexical-Functional Grammar (LFG). In the transfer system. Machine Translation Toolkit, developed by E...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Prague Bull. Math. Linguistics

دوره 91  شماره 

صفحات  -

تاریخ انتشار 2009